Vocabulary-informed Visual Feature Augmen-
ثبت نشده
چکیده
A natural solution for one-shot learning is to augment training data to handle the data deficiency problem. However, directly augmenting in the image domain may not necessarily generate training data that sufficiently explore the intra-class space for one-shot classification. Inspired by the recent vocabulary-informed learning, we propose to generate synthetic training data with the guide of the semantic word space. Essentially, we train an auto-encoder as a bridge to enable the transformation between the image feature space and the semantic space. Besides directly augmenting image features, we transform the image features to semantic space using the encoder and perform the data augmentation. The decoder then synthesizes the image features for the augmented instances from the semantic space. Experiments on three datasets show that our data augmentation method effectively improves the performance of one-shot classification. Extensive study shows that data augmented from semantic space are complementary with those from the image space, and thus boost the classification accuracy dramatically. Source code and dataset will be available. 1 MOTIVATION AND INTRODUCTION The success of recent machine learning (especially the deep learning) greatly relies on the training process on hundreds or thousands of labelled training instances of each class. However in practice, it might be extremely expensive or infeasible to obtain many labelled data, e.g. for objects in dangerous environment with limited access. On the other hand, human can recognize an object category easily with only a few shots of training examples Thrun (1996). Inspired by such an ability of humans, one-shot learning aims at building classifiers from a few or even a single example. The major obstacle of learning good classifiers in one-shot learning setting is the lack of enough training data. Thus a natural recipe for one-shot learning is to augment the data, which has been conducted in various ways. The dominant approach adopted by previous work is to bring in more images Krizhevsky et al. (2012) for each category as training data. These additional augmented training images could be borrowed from unlabelled data Fu et al. (2015) or other relevant categories Wang & Hebert (2016a;b); Li & Hoiem (2016); Lim et al. (2011) in an unsupervised or semisupervised fashion; however the semantic signals of augmented data are often noisy and unreliable and may suffer from negative transfer when the augmented data are from different classes. On the other hand, synthetic images rendered from virtual examples Movshovitz-Attias (2015); Park & Ramanan (2015); Movshovitz-Attias et al. (2015); Dosovitskiy et al. (2015); Zhu et al. (2016b); Opelt et al. (2006) are semantically correct but require careful domain adaptation to transfer the knowledge to the real image domain. In contrast, we propose to directly augment training data in the image feature domain rather than the original image. Augmenting data in image feature domain allows us to interact with useful discriminative signals more directly. The most similar work to us is Zhu et al. (2016b); Opelt et al. (2006), where the feature patches (e.g. HOG) of the object parts are combined to synthesize new feature representation. However, their approach requires strong heuristics and spatial information to learn the combination. On the contrary, we augment data in compact deep learning based feature space, which is stronger for classification but contains limited the spatial information. A straightforward approach to augment image feature is to add random vectors to the feature of each single training image. However, the cutting plane for the classification is usually not in a regular shape, e.g. a hyper ball, and such a simple disturbance, e.g. sampled from Gaussian distribution, may not sufficiently explore the intra-class space for each individual category. Our idea of data
منابع مشابه
Vocabulary-informed Extreme Value Learning
The novel unseen classes can be formulated as the extreme values of known classes. This inspired the recent works on open-set recognition [32, 31, 28], which however can have no way of naming the novel unseen classes. To solve this problem, we propose the Extreme Value Learning (EVL) formulation to learn the mapping from visual feature to semantic space. To model the margin and coverage distrib...
متن کاملA Novel Method for Content Base Image Retrieval Using Combination of Local and Global Features
Content-based image retrieval (CBIR) has been an active research topic in the last decade. In this paper we proposed an image retrieval method using global and local features. Firstly, for local features extraction, SURF algorithm produces a set of interest points for each image and a set of 64-dimensional descriptors for each interest points and then to use Bag of Visual Words model, a cluster...
متن کاملA Novel Method for Content Base Image Retrieval Using Combination of Local and Global Features
Content-based image retrieval (CBIR) has been an active research topic in the last decade. In this paper we proposed an image retrieval method using global and local features. Firstly, for local features extraction, SURF algorithm produces a set of interest points for each image and a set of 64-dimensional descriptors for each interest points and then to use Bag of Visual Words model, a cluster...
متن کاملVisual Word based Location Recognition in 3D models using Distance Augmented Weighting
For visual word based location recognition in 3D models we propose a novel distance-weighted scoring scheme. Matching visual words are not treated as perfect matches anymore but are weighted with the distance of the original SIFT feature vectors before quantization. To maintain the scalability and efficiency of vocabulary tree based approaches PCA compressed SIFT feature vectors are used instea...
متن کاملPredictive Power of Involvement Load Hypothesis and Technique Feature Analysis across L2 Vocabulary Learning Tasks
Involvement Load Hypothesis (ILH) and Technique Feature Analysis (TFA) are two frameworks which operationalize depth of processing of a vocabulary learning task. However, there is dearth of research comparing the predictive power of the ILH and the TFA across second language (L2) vocabulary learning tasks. The present study, therefore, aimed to examine this issue across four vocabulary learning...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017